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Motivated by modeling and analysis of mass-spectrometry data, a semi- 
and nonparametric model is proposed that consists of a linear parametric 
component for individual location and scale and a nonparametric regression 
function for the common shape. A multi-step approach is developed that 
simultaneously estimates the parametric components and the nonparametric 
function. Under certain regularity conditions, it is shown that the resulting 
estimators is consistent and asymptotic normal for the parametric part and 
achieve the optimal rate of convergence for the nonparametric part when the 
bandwidth is suitably chosen. Simulation results are presented to demon- 
strate the effectiveness and finite-sample performance of the method. The 
method is also applied to a SELDI-TOF mass spectrometry data set from a 
study of liver cancer patients. 

KEY WORDS: Local linear regression; Bandwidth selection; Nonparamet- 
ric estimation. 
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are concerned with the following semi- and nonparametric regression model 
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where yu is the observed response from i-th individual {i = 1, . . . , n) at time t 
for {t = 1, ... ,T), Xit is the corresponding explanatory variable, and /3i are 
individual- specific location and scale parameters and m(-) is a baseline intensity 
function. Here, Eeu = 0, Var (en) = 1, and eu and Xu are independent. Of 
interest here is the simultaneous estimation of Oj, (3i and m(-). We shall assume 
throughout the paper that {i = 1, . . . ,n;t = 1, . . . , T) are independent and 
identically distributed (i.i.d.) with an unknown distribution function, though most 
results only require that the errors be independent with zero mean. 

Model ([T]) is motivated by analyzing the data generated from mass spectrom- 
eter (MS), which is a powerful tool for the separation and large-scale detection 
of proteins present in a complex biological mixture. Figure 1 is an illustration of 
MS spectra, which can reveal proteomic patterns or features that might be related 
to specific characteristic of biological samples. They can also be used for prog- 
nosis and for monitoring disease progression, evaluating treatment or suggesting 
intervention. Two popular mass spectrometers are SELDI-TOF (surface enhanced 
laser desorption/ionization time-of-fight) and MALDI-TOF (matrix assisted laser 
desorption and ionization time-of-flight). The abundance of the protein fragments 
from a biological sample (such as serum) and their time of flight through a tunnel 
under certain electrical pressure can be measured by this procedure. The y-axis of 
a spectrum is the intensity (relative abundance) of protein/peptide, and the x-axis 
is the mass-to-charge ratio (m/z value) which can be calculated using time, length 
of flight, and the voltage applied. It is known that the SELDI intensity measures 
have errors up to 50% and that the m/z may shift its value by up to 0.1%-0.2% 
(Yasui et al. 2003| ). Generally speaking, many pre-processing steps need to be 



done before the MS data can be analyzed. Some of the most important steps are 



noise filtering, baseline correction, alignment, normalization, etc. See, e.g., Guil- 



haus] ( [T995l ); |Banks and Petricoin| ( |2003| ); |Baggerly et al.| ( |2003| [20041 ) ; plamandis 



( |2004[ ); [Feng et al.| ( |2009| ). We refer readers to |Roy et aL] ( |2011| ) for an extensive 
review about the recent advances in mass-spectrometry data analysis. Here, we 
assume all the pre-processing steps have already been taken. 

In model ([T]), m(-) represents the common shape for all individuals while ctj 
and f3i represents the location and scale parameters for the i-th individual, respec- 
tively. Because m(-) is unspecified, model ([T]) may be viewed as a semiparam- 
eteric model. However, it differs from the usual semi-parametric models in that 
for model ([T|), both the parametric and nonparametric components are of primary 
interest, while in a typical semiparametric setting, the nonparametric component 
is often viewed as a nuisance parameter. Model ([T]) contains many commonly en- 
countered regression models as special cases. If all the parametric coefficients 
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Figure 1: Illustration of MS spectra. 




and fii are known, model ([T]) reduces to the classical nonparametric regression. On 
the other hand, if the function m(-) is known, then it reduces to the classical linear 
regression model with each subject having its own regression line. For the present 
case of ctj, (3i and function m(-) being unknown, the parameters are identifiable 
only up to a common location-scale change. Thus we assume, without loss of 
generality, that ai = and /3i = 1. It is also clear that for a^, /3j and m(-) to be 
consistently estimable, we need to require that both n and T go to oo. 

There is an extensive literature on semiparametric and nonparametric regres- 
sion. For semiparametric regression, [Begun et al. ( 1983| ) derived semiparamet- 
ric information bound while Robinson (1988]) developed a general approach to 
constructing i/n-consistent estimation for the parametric component. We refer 
to |Bickel et al.| (|1998|) and [Ruppert et al.| (|2003[) for detailed discussions on the 



subject. For nonparametric regression, kernel and local polynomial smoothing 
methods are commonly used ( |Rosenblatt[ |1956t |Stone[ |1977[ |1982[ |Fan[ |1993| ). 
In particular, local polynomial smoothing has many attractive properties includ- 
ing the automatic boundary correction. We refer to Fan and Gijbels ( [1996 ) and 
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Hardle et al.|(|2004]) for comprehensive treatment of the subject. 



The existing methods for dealing with nonparametric and semiparametric prob- 
lems are not directly applicable to model ([T]). This is due to the mixing of the 
finite dimensional parameters and the nonparametric component. A natural way 
to handle such a situation is to de-link the two aspects of the estimation through a 
two-step approach. In this paper, we propose an efficient iterative procedure, al- 
ternating between estimation of the parametric component and the nonparametric 
component. We show that the proposed approach leads to consistent estimators 
for both the finite-dimensional parameter and the nonparametric function. We also 
establish asymptotic normality for parametric estimator and convergence rate for 
the nonparametric estimation that is then used for optimal bandwidth selection. 



2 Main Results 

In this section, we develop a multi-step approach to estimating both the finite- 
dimensional parameters and /3j and the nonparametric baseline intensity m(-). 
Our approach is an iterative procedure which alternates between estimation of 
and /3j and that of m(-). We show that under reasonable conditions, the estimation 
for the parametric component is consistent and asymptotically normal when the 
bandwidth selection are done appropriately. The estimation of the nonparametric 
component can also attain the optimal rate of convergence. 

2.1 A multi-step estimation method 

Recall that if a-i and (3i were known, the problem would reduce to the standard 
nonparametric regression setting; on the other hand, if m(-) were known, it would 
reduce to the simple linear regression for each i. For the nonparametric regression, 
we can apply the local linear regression with the weights Kh{-) = K{-/h)/h 
for suitably chosen kernel function K and bandwidth h. For the simple linear 
regression, the least squares estimation may be applied. 

Not all parameters in model ([T]) are identifiable as at, (3i and m(-) are con- 
founded. To ensure identifiability, we shall set ai = and /3i = 1. Thus, for 
i = 1, ([T]) becomes a standard nonparametric regression problem, from which 
an initial estimator of m(-) can be derived. Replacing m(-) in ([T]) by the initial 
estimator, we can apply the least squares method to get estimators of (3i for 
i >2, which, together with ai = and /3i = 1 and local polynomial smoothing, 
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can then be used to get an updated estimator of m( ). This iterative estimation 
procedure is described as follows. 

(a) Set ai = and pi = 1, so that yu = m{xu) +ai{xit)eit, t ^1,...,T. Ap- 
ply local linear regression to {xu, yu),t — 1, . . . , T, to get initial estimator 
of m(-) 



ih{x) — 



where cc;u(a;) = Kh{xu - x){St,2 - {xu - x)St,i) and 

T 

ST,k = ^h{xu - x){xit - xf, for A; = 1, 2. 



(2) 



(3) 



(b) With m(-) being replaced by m(-) as the true function, ctj, i = 2, . . . , n 
can be estimated by the least squares method, i.e. 



XlJ=i[rfi(a;it) - m{xi.)]'^ ' 



Oii = yi. - Am(xi.) = yi. 



where 



Y!i=M^{xit) - m{xi)Y 
1 ^ 1 ^ 



(4) 



m[xi), (5) 



(c) With the estimates on and we can update the estimation of m(-) viewing 
OLi and Pi as true values. Specifically, we apply the local linear regression 
with the same kernel function K{-) to get an updated estimator of m( ). 



m{x) — 



it 



where = {yu - ai)/pi 



^iti^) ^PiKh*ixit- x) 



Pi^rJ i^it x) PiSq^i 



(6) 



2 Q*{i) 



i=l 



i=l 



(J) 
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and 



T 

StS = Yl ^f^' - ^)(^^* - ^)''' f^'" ^ = 1' 2. (8) 

t=i 

Note that the bandwidth for this step, h*, may be chosen differently from h 
in order to achieve better convergence rate. The optimal choices for h and 
h* will become clear in the next subsection where large sample properties 
are studied. 

(d) Repeat steps (b) and (c) until both the parametric and the nonparametric 
estimators converge. 

Our limited numerical experiences indicate that the final estimator is not sen- 
sitive to the initial estimate. However, as a safe guard, we may start the algorithm 
with different initial estimates by choosing different individuals as the baseline in- 
tensity. In step (c), the [5i is in the denominator, which, when close to 0, may cause 
instability. Thus, in practice, we can add a small constant to the denominator to 
make it stable, though we have not encountered this problem. 

The iterative process often converges very quickly. In addition, our asymp- 
totic analysis in the next subsection shows that no iteration is needed to reach the 
optimal convergence rate for the estimate of both parametric and nonparametric 
components when the bandwidths of each step are properly chosen. Therefore, 
we may stop after step (c) to save computation time for large problems. 

2.2 Large Sample Properties 

In this section, we study the large sample properties of the estimates for m( ), aj 
and Pi. By large sample, we mean that both n and T are large. However, the size 
of n and that of T can be different. Indeed, for MS data, T is typically much larger 
than n. The optimal bandwidth selection in the nonparametric estimation will be 
determined by asymptotic expansions to achieve optimal rate of convergence. We 
will also investigate whether or not the accuracy of 6ii and /3j may affect the rate 
of convergence for the estimation of m(-). 

The following conditions will be needed to establish the asymptotic theory. 

CI. The baseline intensity m( ) is continuous and has a bounded second order 
derivative. 
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C2. There exist constants a > and S > 0, such that the marginal density /(■) 
of Xit satisfies f{x) > 5, and \f{x) — f{y)\ < c\x — y\"' for any x and y in 
the support of /(■). 

C3. The conditional variance (jf{x) = Var {yulxu = x) is bounded and contin- 
uous in X, where i = 1, . . . ,n and t = 1, . . . , T. 

C4. The kernel K{-) is a symmetric probability density function with bounded 
support. Hence K{-) has the properties: K{u)du = 1, uK{u)du = 
0, v?'K{u)du 7^ and bounded. Without loss of generality, we could 
further assume the support of K{-) lies in the interval [—1, +1]. 

Condition CI is a standard condition for nonparametric estimation. Condition C2 
requires that the density of xu is bounded away from 0, which may be a strong 
assumption in general but reasonable for mass spectrometry data as xu are approx- 
imately uniformly distributed on the support. In addition, the density is assumed 
to satisfy a Lipschitz condition. Condition C3 allows for heteroscedasticity while 
restricting the variances to be bounded. Condition C4 is a standard condition for 
kernel function used in the local linear regression. 

The moments of K and are denoted respectively by ni = v}K{u)du 
and vi = u^K'^{u)du for / > 0. 

Lemma 1. Suppose that Conditions C1-C4 are satisfied. Then for m(-) defined 
by (|2]), we have, as h and Th — )■ oo, 

m(x) = m{x) + -m" (x) fi2h'^ + o{h^) + Ui{x), (9) 

whereUi{x) = iYlt=i^isix)ai{xis)eis)/iYlt=i^isix)). 

Lemma [T] allows us to derive the asymptotic bias, variance and mean squared 
error for the estimator m(-). This is summarized in the following corollary. 

Corollary 1. Let X denote all the observed covariates {xit,i = I, . . . ,n,t = 
1, . . . , T}. Under Conditions C1-C4, the bias, variance and mean squared error 
ofrh{x) conditional on X have the following expressions. 

E (m(x) — m(x) |X) = -m"(x)/i2^^ + o{h'^), 

Var(m(a;)|X) = -L[/(a;)]-V?(x)z/o + o(^) , 
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It is clear from the above expansions that in order to minimize the mean 
squared error of m(x), the bandwidth h should be chosen to be of order T^^/^. 
However, we will show later that this is not necessarily optimal for our final esti- 
mator m(-). 

For estimation of scale parameters we can apply Lemma [T] together with 
the Taylor expansion to derive asymptotic bias and variance. In particular, we 
have the following theorem. 

Theorem 1. Suppose that Conditions C1-C4 are satisfied and that h ^ is 
chosen so that Th — t- oo. Then the following expansions hold for i >2. 



E (A - A|X) = -P,(h'P, + ^Q?j + o(/i2 + J^), (10) 



where Wu = m{xit) - m{xi.),m{xi.) = T ^ Ylt=i ^i^u 



' 5 



It 



Remark 1. The asymptotic bias and variance of parameter estimator ai can be 
similarly derived. In fact, they can be inferred from the bias and variance of f^i 
through its linear relationship with Pi, thus having the same order as those of /3i 
in (fTOl) and (fTTl). 



Remark 2. The bias of Pi is of the order fi^ + {Th)~^ and the variance is of the 
order T~^. To obtain the -s/T -consistency for Pi, i.e. \/T{Pi — Pi) = Op{\), the 
order of bias should be 0(T~^/^). This is achieved by choosing h to be between 

T-^l^ andT-^'\ 

From the asymptotic expansion for the mean and variance of the initial func- 
tional estimator m(-) and parameter estimator Pi, we can obtain the asymptotic 
expansions for the bias and variance of the subsequent estimator of the basehne 
intensity, m(-). 

Theorem 2. Suppose that Conditions C1-C4 are satisfied. Suppose also that h 
for rh{-) and h* for m(-) are chosen so that /i — )■ 0, /i* — 0, Th — )■ oo, and 
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nTh* — !■ oo. Then the following expansions hold: 

E {m{x) — m(x) |X) 



+ y^/i*2 + 0(^/1' + — + h*^ 



Var (mfx) X) 



1 



T 



+ 



2\2 



1 

+ 



1 1 
T nT/i* 



where Pi,Qi,Wit are the same as those in Theorem^ and Ri = m{xi.)Pi — 

In the ideal case when the location-scale parameters are known, the bias and 
variance of the local linear estimator of baseline intensity m(-) should be of the or- 
der 0(/i*^) and 0(^^7j^). And the optimal bandwidth in this ideal case should be 
of order (nT)^i. Therefore the bias and variance of the nonparametric estimator 
are 0((nT)^5 ) and 0((riT)^5 ), respectively. In addition, the mean squared error 
is of order 0{{nT)^5^. Interestingly, by choosing the bandwidths h and h* sep- 
arately, we can achieve this optimal rate of convergence for the baseline intensity 
estimator m(-) through the proposed multi-step estimation procedure when the 
orders of n and T satisfy certain requirement. Notice that the parametric compo- 
nents will have the optimal convergence rate simultaneously. The conclusions 
are summarized in the following theorem. 

Theorem 3. Suppose that Conditions C1-C4 are satisfied. The optimal paramet- 
ric convergence rate of location-scale estimators can be attained by setting h to 
be of order T^s; the optimal nonparametric convergence rate of the baseline in- 
tensity estimator m(-) can be attained by setting h* to be of order {nT)^5 and h 
of order T~3, when T — > 00, n — )■ 00, and n = 0{T^^^). 

Remark 3. It is clear from Theorem^that if the requirement n = 0(T^/^) is not 
satisfied, then the nonparametric estimator m(-) will not achieve the optimal rate 
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of convergence at any choice of the bandwidths. However, the choice ofh and h* 
is optimal even ifn = 0(T^/^) does not hold. 

Theorem 4. Suppose that Conditions C1-C4 are satisfied. In addition, assume 
E[rri^{xit){(y'f{xit) + 1)] < oo and Elm^i^x a)] > for all i = l,...,nandt = 
1, . . . , T. If we restrict the order of h to lie between T^a andT~i, f3i is asymptotic 
normal: 

Vr(A-/3)-^iV(0,af), (12) 



where 



T^ooy (T-i Eti w^?)^ (T-^ELw^)^ 

Here, if we assume (■) to be a constant for each subject i, then its value can 
be consistently estimated by the plug-in estimator 



T 

a] = Y^^yu -a,- Am(x,^))^ (13) 
t=i 



where ai = and /3i = 1. 

From ( [T2] ), the asymptotic variance of /3j is of order 0{T^^), provided that the 



order of the bandwidth h is properly chosen. Since the asymptotic expansion for 
/3i does not involves the choice of h*, the specific choice of different h* will not 
affect the order of the asymptotic variance of (3i. 

2.3 Bandwidth Selection 



In Section 2.2 we studied how the choice of bandwidths h and h* may affect the 
asymptotic properties of the estimators. However, in practice, we need a data- 
driven approach to choosing the bandwidths. Our suggestion on this is to use a 
i^-fold cross-validation bandwidth selection rule. 

First, we divide the n individuals into K groups Zi, Z2, . . . , randomly. 
Here, Z^ is the k-th test set, and the k-th training set is Z_fc = {{1, . . . , n}\Zk}. 
We estimate the baseline curve m(-) using the observations in the training set Z_fc 
and denote the estimator as m(Z^k, h, h*), where h and h* are the bandwidths of 
the two nonparametric regression steps for m(-) and m(-), respectively. Recall that 
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at the beginning of the multi-step estimation procedure, we fix the first observation 
as the baseline to solve the identifiability issue. In the case of cross-validation, for 
each split, the baseline will corresponds to the first observation inside Z_fc, which 
is different for different k. We circumvent the problem of comparing different 
baseline estimates by using them to predict the test data in Z^, i.e., after obtaining 
the estimator of baseline curve from Z_fc. We then regress it on the data in Z^, 
and compute the mean squared prediction error (MSPE). 

1 ^ 

MSPE{Z.k, h, h*) = -J2 J2^y^t - + 4^mt(Z_fc, h, (14) 

ieZfe t=i 

where dki and are the estimated regression coefficients. We repeat the calcu- 
lation for k = 1, . . . , K, and the optimal pair (h, h*) is the one which minimizes 
the average MSPE, i.e. 

1 ^ 

{h, h*) = arg min — V MSPE{Z^k, h, h*). (15) 

(h,h*) I\ 

The effectiveness of the cross-validation will be evaluated in Sections [3] and |4l 

3 Application to Mass Spectrometry Data 

We now apply the proposed multi-step method to a SELDI-TOF mass spectrom- 
etry data set from a study of 33 liver cancer patients conducted at Changzheng 
Hospital in Shanghai. For each patient, we extract the mjz values in the region 
2000-10000 Da, which is believed to contain all the useful information. Figure |2] 
contains the curves of 10 randomly picked patients. 

There are some noticeable features in the data. All curves appear to be contin- 
uous. They peak simultaneously around certain locations; at each location, curves 
have the same shape but with different heights. All those features are captured 
well by our model. 

Since the observed values oimj z for each person may fluctuate, we need to 
perform registration to make the analysis easier. Here, we use the observations 
from the first individual and set his/her mjz values as the reference. Then we use 
the linear interpolation method to compute the intensities of all the other individ- 
uals at the reference mjz locations. After that we get the preprocessed data which 
has the same mjz values for each observation. 
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Figure 2: Curves of 10 observations and the baseline estimate 
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We use the cross-validation method described in Section 2.3 to select the op- 
timal bandwidths with K = 33, i.e., leave-one-out cross validation. We compute 
the MSPE at the grid of h = 2, 4, 6, . . . , 40 and h* = 2, 4, 6, ... , 40. Table [T] 
contains a portion of the result with h = 30,32, ... , 40 and h* = 2, 4, 6, ... , 20. 

As we can see in Table [T} the minimum MSB occurs at the location of /i = 34 
and h* = A, which agrees with our theory that h and h* should not be chosen with 
the same rate for the purpose of estimating the nonparametric component. 

Finally, we use the selected bandwidths to estimate the location and scale pa- 
rameters as well as the nonparametric curve for the whole data set. The estimated 
parameters are reported in Table [2] and the baseline nonparametric curve estima- 
tion is shown in the lower part of Figure [2j From Table [2} we can see that each 
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Table 1 : MSE of the leave-one-out prediction of real data 
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individual has very different regression coefficients, which was also verified by 
looking at Figure [2} In addition, comparing the estimated curve for the baseline 
intensities with the real curves of 10 observations, it is clear that the majority of 
the peaks and shapes are captured by the nonparametric estimate with appropriate 
degree of smoothing. 

4 Simulation Studies 

We conduct simulations to assess the performance of the proposed method for pa- 
rameter and curve estimation. The true curve m(-) is chosen from a moving aver- 
age smoother of the cross-sectional mean of a fraction of real Mass Spectrometry 
data in Section [3] after log transformation. We set 10000 m/z values equally- 
spaced from 1 to 10000 (T = 10000) and the number of individuals n = 30. The 
true values of the parameters ai, /3i,i = 1,2, ... ,n for each individual are shown 
in Tablejs} And the error terms eu are sampled independently from A^(0, a^) with 
a = 0.25. 

We apply our multi-step procedure to the simulated data with different choices 
of the bandwidth. The number of runs is 100. The estimated parameters aj and /3j 
are shown in Table |3] along with the standard errors. We set h = 35, which leads 
to the smallest MSE of m shown in Table IH It is evident that the estimation is 
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Table 2: Regression parameters of real data 
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very accurate for all the location and scale parameters. A graphical representation 
of the raw curve of the 16th subject along with estimates derived from m(-) and 
m(-) can be found in Figure |3] We can see from the figure that the estimate from 
m(-) is notably better than that from m(-), which shows that multi-step procedure 
is effective in improving the estimates for the baseline curve. We observed similar 
phenomenon for all the other subjects. 

From Table |4} we can see that the global optimal bandwidths are h = 25, h* = 
36. It is interesting to note that the optimal bandwidth for m(-) is h = 35, which 
is different from the optimal bandwidth for the final estimator. 

To evaluate the quality of the our multi-step estimation method for the non- 
parametric baseline function, we consider a classical nonparametric estimation on 
another set of data where the same true function m(-) is used but = 0, /3i = 1 
for alH = 1, . . . ,n. We applied the same local linear estimation with different 
bandwidths. The mean MSE of the estimated m(-) from 100 repetitions for differ- 
ent hs are given in Table[5} When we applied the multi-step estimation procedure, 
the best mean MSE we achieved in Table|4]is very close to the minimal mean MSE 
0.4442 for the oracle estimator. This comparison confirms that there is little loss 
of information in the proposed method when both parametric and nonparametric 
components are estimated simultaneously. 

We use cross-validation to get a data-driven choice of the bandwidths. Here, 
we set = 5 to get a mean MSPE of every different bandwidth choices of both 
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Figure 3: Nonparametric estimates of the curve from m(-) and m(-) for the 16th 
object. 
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Steps over 100 runs, and the optimal bandwidths are those with the minimum mean 
MSPE. The mean MSPE values are shown in Table [6j from which we can see that 
the smallest value is located ath = 25, h* = 36, which is quite close to the optimal 
bandwidths h = 25 and h* = 38 in Table |4| Therefore, the cross-validation idea 
appears to work well in terms of selecting the best bandwidths. 



5 Discussion 

This paper proposes a semi- and nonparametric model suitable for analyzing the 
mass spectra data. The model is flexible and intuitive, capturing the main feature 
in the MS data. Both the parametric and nonparametric components have natural 
interpretation. A multi-step iterative algorithm is proposed for estimating both the 
individual location and scale regression coefficients and the nonparametric func- 
tion. The algorithm combines local linear fitting and the least squares method, 
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Table 3: Regression parameter estimates {h = 35) 



ID 


a 


/3 


a 


/3 


ID 


a 


/3 


a 


/3 


1 





1 


0.000(0.000) 


1.000(0.000) 


16 


0.6 


1 


0.605(0.026) 


0.997(0.019) 


2 


0.2 


0.2 


0.202(0.020) 


0.198(0.013) 


17 


0.8 


0.2 


0.799(0.020) 


0.201(0.014) 


3 


0.4 


0.5 


0.399(0.023) 


0.501(0.016) 


18 


1 


0.5 


1.004(0.024) 


0.497(0.016) 


4 


0.6 


1.5 


0.598(0.040) 


1.502(0.027) 


19 





1.5 


-0.001(0.044) 


1.501(0.029) 


5 


0.8 


2 


0.801(0.047) 


2.000(0.031) 


20 


0.2 


2 


0.206(0.043) 


1.997(0.029) 


6 


1 


1 


0.999(0.029) 


1.001(0.020) 


21 


0.4 


1 


0.403(0.030) 


0.998(0.021) 


7 





0.2 


-0.001(0.022) 


0.201(0.015) 


22 


0.6 


0.2 


0.598(0.024) 


0.201(0.016) 


8 


0.2 


0.5 


0.200(0.021) 


0.500(0.014) 


23 


0.8 


0.5 


0.801(0.024) 


0.500(0.016) 


9 


0.4 


1.5 


0.404(0.033) 


1.498(0.023) 


24 


1 


1.5 


0.996(0.038) 


1.503(0.025) 


10 


0.6 


2 


0.603(0.044) 


1.999(0.030) 


25 





2 


0.001(0.044) 


2.000(0.029) 


11 


0.8 


1 


0.800(0.026) 


1.001(0.018) 


26 


0.2 


1 


0.203(0.030) 


0.998(0.020) 


12 


1 


0.2 


1.002(0.021) 


0.198(0.014) 


27 


0.4 


0.2 


0.399(0.021) 


0.201(0.015) 


13 





0.5 


0.003(0.023) 


0.499(0.016) 


28 


0.6 


0.5 


0.604(0.021) 


0.497(0.014) 


14 


0.2 


1.5 


0.206(0.036) 


1.497(0.024) 


29 


0.8 


1.5 


0.803(0.032) 


1.499(0.023) 


15 


0.4 


2 


0.401(0.048) 


2.001(0.033) 


30 


1 


2 


1.001(0.047) 


2.000(0.031) 



* Standard deviations are in parentheses 



both of which are easy to implement and computationally efficient. Both sim- 
ulation studies and real data analysis demonstrate that the proposed multi-step 
procedure works well. 

The local linear fitting for the nonparametric function estimation maybe re- 
placed with other nonparametric estimation techniques. Because the location and 



scale parameters are subject specific, the empirical Bayes method (Carlin and 



Louis 2008 1 may be used. In addition, nonparametric Bayes may also be applica- 



ble with the nonparametric function being modeling as a realization of Gaussian 
process. 

The proposed model and the associated iterative estimation method do not ac- 
count for the random error in the measurement of X. It is desirable to incorporate 
the measurement error into the model ( [Carroll et al.[ |2006| ) . 

Many studies involving MS data are aimed at classifying patients of different 
disease types. The information of peaks are usually applied as the basis of the clas- 
sifier. The proposed method provides a natural way of finding the peaks for dif- 
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Table 4: MSE of the initial and updated estimation of m 



MSE of m 



h 


20 


25 


30 


35 


40 




9.1112 


7.5509 


6.7024 


6.3418 


6.3653 


MSE of m 




20 


25 


30 


35 


40 


20 


0.6145 


0.5936 


0.5925 


0.6078 


0.6388 


22 


0.5762 


0.5563 


0.5561 


0.5723 


0.6042 


24 


0.5453 


0.5265 


0.5272 


0.5443 


0.5771 


26 


0.5204 


0.5026 


0.5043 


0.5223 


0.5561 


28 


0.5005 


0.4838 


0.4866 


0.5056 


0.5403 


30 


0.4850 


0.4695 


0.4733 


0.4934 


0.5291 


32 


0.4735 


0.4592 


0.4641 


0.4852 


0.5220 


34 


0.4657 


0.4527 


0.4587 


0.4809 


0.5188 


36 


0.4612 


0.4496 


0.4568 


0.4802 


0.5192 


38 


0.4601 


0.4498 


0.4583 


0.4829 


0.5231 


40 


0.4622 


0.4533 


0.4631 


0.4889 


0.5303 



ferent group of patients by use the multi-step estimation procedure on each group 
and find out the corresponding nonparametric baseline function. From the esti- 
mated baseline function, the information of peaks can be easily extracted, which 
can then be used for classification. 
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Table 5: MSE of the estimation of m in the dataset with same parameters in each 
individual 



h 


20 


30 


40 


50 


60 


MSE 


0.6881 


0.4936 


0.4442 


0.4926 


0.6337 



Table 6: Mean MSPE of the 5-fold CV over 100 times. Here, we multiply T for 
all the MSPE values and subtract the minimum 625.61956. 



\. h 


20 


25 


30 


35 


40 


20 


0.293733 


0.293728 


0.293724 


0.293722 


0.293721 


22 


0.222948 


0.222943 


0.222940 


0.222939 


0.222939 


24 


0.165816 


0.165813 


0.165810 


0.165809 


0.165809 


26 


0.119253 


0.119250 


0.119248 


0.119248 


0.119248 


28 


0.081603 


0.081600 


0.081599 


0.081598 


0.081599 


30 


0.052297 


0.052295 


0.052294 


0.052294 


0.052295 


32 


0.030256 


0.030255 


0.030254 


0.030255 


0.030256 


34 


0.014621 


0.014620 


0.014620 


0.014621 


0.014622 


36 


0.004761 


0.004760 


0.004760 


0.004761 


0.004763 


38 


3.4e - 08 





5.6e - 07 


2.0e - 06 


4.2e - 06 


40 


0.000590 


0.000590 


0.000592 


0.000593 


0.000596 



Appendix 

The Appendix contains proofs of Lemma 1, Corollary 1 and Theorems 1-4. We 
begin with some notation, which will be used to streamline some of the proofs. 
Because all asymptotic expansions are derived with x^s being fixed, we will, 
for notational simplicity, use E to denote the conditional expectation and Var to 
denote the conditional variance given xu's throughout the Appendix. For i — 
1, . . . , n and i = 1, . . . , T, let 

. . _ ^ _ ai{Xit)Witrh{xi.) 

Z_/s=l IS 
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5.1 Proof of Lemma [T] 

Proof. It follows from ([3]) and the definition of Wa that 

T 

^Uit{x){Xu -X) = St,2St,1 - St,iSt,2 = 0. 

t=l 

From Condition CI, we have 

^(a^it) = itl{x) + 'm'{x){xit — x) -\ — 'm"{x){xit — x)^ + o{{xit — x)'^), 
where o(-) is uniform in t. Thus 

Y.J=i^uix)[m{x) + |m"(x)(xu - x)^ + o((xu - x)^)] 



mix) 



+ 



Ef=i^it(a^) 
X;f=ituit(a;)cri(xit)eit 



= + ^ =;f 7-, + 

= ^(X) + ' Q2 N + 0( ) + (16) 

where the last equality follows from the definition of j, j = 0,1,2,3. 

A standard asymptotic expansion for the local linear smoothing (Fan and Gij- 
bels, 1996, eq 3.13) results in 

St,j = Th^f{x)fi,{l + op{l)},j = 0, 1, 2, 3. (17) 



Note that with j = and 1 in ([17]), we have, 5*^,0 = Tf{x){l + Op(l)) and 
'S'r,i = Op{l) since /io = 1 and /^i = 0, combined with ([16]), 



m{x) = m{x) + -m"(x)/i2/i^ + o{h'^) + Ui{x). 
This completes the proof of Lemma[T] □ 
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5.2 Proof of Corollary HI 

Proof. Being a weighted average of mean-zero random variables, Ui{x) has zero 
mean. Thus, from Lemma 1, we have 

E [m(x) — m[x)] = -m" {x) fi2h'^ + o{h'^). 
For the variance term, from the definition of m(-), we have 



Var (m(x)) = Var 



Var 



It 



Yld=i^u{x)cFi{xit)e 

St,2 Yh=i ^hjxit - x)ai{xu)eit - St,i Kh{xit - x){xit - x)ai{xit)eu 

St,oSt,2 — S^i 



f Et=i Khjxit - x)ai{xit)eu \ ^ ^ f f J2t=i ^h{xit - x)(Ji{xit)eit 



St,o J \ \ ^T,0 



^[/Wl-V?(.).„ + „(2. 



where the third equation follows from ( 17 1, and the last equation follows from Condition 



C3 and ( 17 1. Combining the above asymptotic expansions for the bias and variance terms 



leads to the desired expansion for the mean squared error. □ 

5.3 Proof of Theorem 1 

Proof. First of all, define Wu = fh{xit) — m{xi) to simplify the presentation. By 
definition, we have the following expansion for (3i when i > 2. 

o_n _ n Y.J=i Wu{m{xit) - mjxit)) YlJ=i Wit(Ti{xit)eit 

= /3iD, + G,. (18) 
From Lemma 1 and the proof of Corollary [1] we have 

mt-Wu = Op{h^). (19) 
Plugging (|9]) into Di, we have 
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Yl=i{\Wu + 0{h'')]U,{xu) + 0(/i^)[[/i(a;,,) - Ui{x,.)]} 

Ef=i 



Eti Eti 

+ o(/i2 + ^), (20) 



where the last asymptotic expansion follows from ([19]). Similarly for Gi, we have 

^ EL^^^^^(^»f)g^^(l + Q(^')) EL(^i(^»f)-^i(3^^-))^i(^»t)eit 

= ^^^^^^(1 + 0,(1)). (21) 

We observe that for any i > 2, Ui{xu) is a linear combination of {eit,t = 
1, . . . , T}. Therefore, Ui{xit) is independent of {e^, i = 2, . . . ,n,t = 1, . . . , T}. 
By using the tower property, we have EGi = 0. Therefore, (3iDi is the only part 
that contributes to the bias of In view of these and Corollary [l| we have the 
following expansions for the bias and variance terms 

E (ft _ . _ft,^P. _ ft^lXW_W ^ ^ 1 ) 



and 



-P,h'P,-P~Q, + o{h' + ^), 



Var (A) =Var | + 



Et=iW^ 



+Var I EL^.ta.(x.,)6., 
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Straightforward variance calculation for an independent sum gives 



T T 



Var Yl ^Mxit) ]=J2[J2^^t 

\t=l I s=\ t=l 



alixu). (22) 



We have 



Et / \ -I ■.\ Kh{Xit — Xis){St,2 — (Xit — Xis)St,i) 
[m{xit) - m{xi.)) — — -2 '—. 



We expand m{x) in the neighborhood of point Xis using Taylor's expansion, 

m{xit) = m{xis) + {xit - xu)m'{xu) + ^{xu - xisfm"{xis) + Op{{xit - xu)'^). 

Since the kernel function Kh{x — Xis) vanishes out of the neighborhood of Xis 
with diameter h, we can obtain the following 

^ Kh{Xit - Xis){St,2 - {Xit - Xis)St,i) 



C C C2 
OT,0^T,2 — ^T,l 



Y m{xit) - 
t=i 

bT,0^T,2 - Jt,1 

T 

^m{xu) + 0,{h') V K^^^^t-^^s){ST,2-{xu-x,,)Sr,^ 

^ bT,0^T,2 - 

=m{xis) + Op{h^), 

where the functions Sr^k, A; = 0, 1, 2 are evaluated at the point xu. Combined with 
rh{xi.) = m{xi.) + Op(T~^/^), we can have the expansion 

22 



Then recall (|22l), wehave Var {Y.J=i WuUi{xu)) = Ef=i Wlal{xit)+Op{T), 



which leads to the variance expansion 



□ 



5.4 Proof of Theorem 2 

Proo/ Recall and ([8]), we have 

n T n n n n 

i=l t=l i=l i=l i=l j=l 

Then we have the asymptotic expansion of the updated estimator of baseline 
intensity m(-) at time point x as follows. 
By definition of m(-) in (|6]) , we can write 

m(x) — m{x) 

Er=i Ef=i - , EILi Er=i ^rtf^(a;it)ei/ A 



+ 



^Er=iEf=i^>M(A-A)/A 



From the proof of Theorem [T| we have 

«_« Ru2p. a Tj=iiUi{xit) - Ui{xi.))Ui{xit) ^ Tj=iWitUi{xit) , ^ ,,,, 

+ {l + o,{l))+o[h +-). (2A) 

/-^t=i it 
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Then, from the least square expression, we have the asymptotic expansion for Oj as fol- 
lows. 



ai = Vi- - (3im{xi.) 



ai + j3im{xi.) + ei. - I3i[m{xi.) + ^m"{xi.)h^ + Ui{xi.) + o{h^)] 



Ui + I3ih Ri + m{xi.)l3, 

T T 



Et=iiUiM-Uiix,.))Uiixit) 



+ o h'' + 



Th 



+ ^it^iti^ + VitUl{x^t){l + Op{l)). 



(25) 



t=i 



t=i 



Now, we plug the above asymptotic expansions ( 24 1 and ( 25 1 into the right hand side 



of ( [23] ). The first part of ( [23] ) could be expanded as follows 

Er=iEr=i^?t(«i-«o/A 
Er=i Et=i 



Er=i ELi(«i - * (ifKh*{xit - x) 


Ei=l ^i'^rj i^it x) Ei=l ^i^T,l 


EtiEj=J!K,.ixu-x) 


ELi Pfs^S - [xu - X) u=i Pls<i 









U=J!EJ=iK,.{xu-x) 


Ej=i l^i^rj (^it x) Yli=i Pi^rJ 





(26) 



The numerator of ( |26| ) has expansion 

n 

Tf{x){l + Op{l)} ^fTh*'f{x)M^ + Op{l)} 

n . 
Th*{h*f'{x)f,2 + Op{h*' + —=)} Y PfTh*{h*f'{x)fi2 + Op{h*' + —=)} 

i=l 



i=l 



i=l 



T\*^ 5^(a. - a.) A f{x){l + Op(l)} ^ ^!f{x)fi2{l + Op(l)} 



i=l 



n ■ 

{h*f'{x)fS2 + Op{h*' + — )} Y MWnx)li2 + Op{h*' + -j==)} 



i=l 



4 = 1 



(27) 
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where the last equation following from h = Pi + 0{h^) + 0{{Thy^) + Op{T~^/'^). 
Similarly, the denominator of ([26]l has the following expansion 



A' Tf{x){l + Op(l)} A'rr2/(x)/i2{l + Op(l)} 
i=l L i=l 

- Th*{h*nx)f,2 + Opih*^ + -L=)} Y PfTh*{h*nx)f,2 + Op{h*^ + -L= 

i=l 

n r- n 

--T^h*^ Y + Op{l)} Y /3'/(^)/«2{l + o,(l)} 

Vrh* 



)} 



i=l 



i=l 



- {h*nx)f,2 + Op{h*' + -L=)} Y ^nh*f'ix^)^^2 + Op{h*' + -^)} 

«=1 



1=1 



i=l 



(28) 



Then combining the expansions ( |27| ) and ( [28| ), we have the following expansion for 
the first part of (|23]l. 

ELi Er=i - 





OA 


/(^)Er=i/3?/(x)/^2{i+op(i)} 




/(^)Er=i/?f/(^)M2{i+o,(i)} 





For other parts of pS] ), we can apply the same techniques for expansion. As a result, the 
following expansion of rh holds. 
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m(x) — m{x) 



(l + Op(l)) + 



Er=i/3fr/(x) 



+ 



Er=i/3fr/(x) 

n r T _ T T 

1=2 t=l t=l t=l 



ELi/3f 



+ 



EILi/3f 



+ m(x) 



+ m(x) 



Er=i/3f 



+ 



EILi/3f 



T 



Then it is straightforward to derive the bias of rh{x) as follows 



E (m{x) — m{x)) 



ELi 



m{x) 



For the variance of m{x), we notice that the error terms {en, i = 1, . . . , n, t = 
1, . . . , T} are independent, which implies the independence of e^t, i = 2, . . . , n 
and Ui{xit). Therefore, we have the following asymptotic expansion for the vari- 
ance. 
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Var (m(x) — m{x)) 



Var 



+ 



I Ei=l Et=l l^i^h* {xjt — x)e it 
T.i=il3!Tj=iKh*ixu- X 

Var 



1 1 
o{- + 



T nTh* 



1 



+ 



where the expansions follow similar techniques as ( [27] ) and ([28]). Now, by the 
definition of f/i, we have 

Var (m(x) — m{x)) 



(EILi/3f)^ 
+ f 4 + 



=1 j=2 



1 



T nTh* 



□ 



5.5 Proof of Theorem 3 

Proof. From the results of Theorem 2, it is straightforward to show that the order 
of the mean squared error of rn{x) is h^ + (T^/i^)-! + h*"^ + T"^ + {riTh*)-^. 
To minimize the mean squared error, we can taken h = 0{T-^^^) and h* = 
0{{nT)^^/^). Under such choices of h and h*, the order of the mean squared 
error is {nT)-'^l^ + T~^ . 

Therefore, to match the optimal nonparametric convergence rate [nT]^^/^ for 
mean squared error, the condition n = 0(T^^^) is required. □ 



5.6 Proof of Theorem 4 



Proof. We start from the asymptotic expansion from ( [24] ) in the proof of Theorem 
2. First, we investigate the asymptotic behavior of the third term on the right hand 
side of (IM. 
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As a first step, we have 

T 



.t=i 



t=i 



(29) 



Now, following the definition of Ui{-) and applying the same expansion of 
i^is{xit) as in the proof of Theorem 1, 



E 



.t=i 

■ T 

■ t=i 



Tfixu) 



where the last inequality follows from exchanging the summation order and the 
property of the kernel function K{-). Observe that /(■) is bounded from below by 
Condition C2, the following inequality sequence is obtained. 



E 



^ s,u=l ' 



< 



0(1) 



s,u=l 



{|a;is-a;i„|<2ft,}) 



where the last term has the order of 0{h ^) by noticing 

T T 

Yl k\xis-xiu\<2h} = 5]]4T/i/(xi,). 

s,u=l s=l 

We can also derive the order of the variance for the other two terms. 



T 



Var -A 



+ 



Ell wi 



0{T-'). 
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Due to the relationship of h and T, the third term is negligible when calculating 
the asymptotic variance. Then, the expansion for the bias of (3i can be rewritten as 
follows 



where the right hand side is an independent sum of random variables with their 
variances being of the same order, 0(T^^). As a result, the central limit theorem 
can be applied directly for j3i. 

Vf0, - A - P,{h'P, + ^Q,)] iV(0, af ), 

where the asymptotic variance cr*^ is finite with the following expression. 



a 



*2 



lim 



(T-' Ef=i w^,y 



Notice that if the order of h is between T 2 and T 4 , then /3j is asymptotic unbi- 
ased since VTl3i{h^Pi + ^Qi) ^'^ 0. 

From Theorems [l] and [2] we have dj, m(-) are consistent estimators of 
A, ai, m(-), respectively. Thus, af = |; Y^^iiVit - oti - f5im{xit)f is also con- 
sistent for the variance under the assumption that crj(-) is a constant function for 
each subject i. □ 
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